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^1 . Abstract. In this work we focus on the determination of the relative 

distributions of young, intermediate-age and old populations of stars in 

■ galaxies. Starting from a grid of theoretical population synthesis models 
C<") | we constructed a set of model galaxies with a distribution of ages, metal- 

licities and intrinsic reddening. Using this set we have explored a new 
fitting method that presents several advantages over conventional meth- 
ods. We propose an optimization technique that combines active learning 
£T) | with an instance-based machine learning algorithm. Experimental results 

■ show that this method can estimate with high speed and accuracy the 

physical parameters of the stellar populations. 
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The availability for the first time of huge astronomical spectroscopic surveys 
such as the SDSS, with more than 10 6 spectra, will allow the determination of 
intrinsic physical parameters of a large number of galaxies, including the age 
distribution or star formation history and metallicity distribution of their stellar 
populations. 

The importance of the accurate knowledge of these parameters for cos- 
mological studies and for the understanding of galaxy formation and evolution 
cannot be overestimated. Template fitting has been used to carry out estimates 
of the distribution of age and metallicity from spectral data. Although this 
technique achieves good results, it is very expensive in terms of computing time 
and therefore can be applied only to small samples. 

Starting from a grid of theoretical population synthesis models we con- 
structed a set of model galaxies with a distribution of ages, metallicities and in- 
trinsic reddening. Using this set we have explored a new method that maximizes 
speed and accuracy. Our proposed technique combines standard least-squares 
fitting with an active instance-based machine learning algorithm. Experimental 
results show that this method can estimate with high speed and accuracy the 
physical parameters of the stellar populations. Based on empirical evidence we 
believe that this method can be applied with equal success to other astronomical 
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problems, reducing the computational cost and thus providing the capability of 
analyzing larger quantities of astronomical data. 

2. Description of the Models 

For the spectral synthesis of simple stellar populations the atmospheric models 
have been folded with the predicted number of stars along isochrones of given 
age and metal content (Bressan et al. 1994). The atmosphere models have been 
inserted in low resolution Kurucz models (Kurucz 1993) in order to preserve the 
complete energy distribution. 

The models have the following characteristics: 

• Ages are from 10 6 yr to 2 x 10 10 yr in logarithmic steps: 

[10 6 yr, 10 8 yr,10 8 - 3 yr,10 8 - 6 yr,10 9 yr,10 9 - 6 yr,10 9 - 78 yr,10 10 yr ,10 10 'V] 

• Metallicity has the values Z=[0.0004, 0.004, 0.008, 0.02, 0.05] in Solar units 

• The resolution is smoothed at the desired value. 

For the present experiments we used solar metallicity (0.02) and a resolution of 
20 A. 

3. The Proposed Solution 

Given an observed galaxy spectrum we would like to determine the relative 
distribution of ages and their intrinsic reddening. We restricted the problem to 
finding three contribution of ages: starbursts of age lMyr, an intermediate age 
population with age between lOOMyr and lOOOMyr and an old population with 
age greater than lOOOMyr. Each of the three populations is affected by the same 
reddening law which is defined as follows: 

R(*,\) = l-e Axc ' (1) 

where q is the free parameter of each stellar population and A is the wavelength, 
in this case going from 890A to 2.301 fim. In order to determine the free pa- 
rameters of reddening and the relative contributions we pose the problem as an 
optimization problem, where a modified version of a machine learning algorithm 
is trained to estimate the reddening parameters of the three populations. Once 
we have an estimate of the reddening we can compute the relative contribution 
of ages, A, with a pseudo inverse matrix as follows: 

Let M = [mf, mg] be the grid of our nine theoretical models described earlier. 
if is the observed spectrum, and = c±, 02,03 is the vector of the free redden- 
ing parameters predicted by the learning algorithm for if. We can compute 
S = [Fi r , •••>^ ? 9r]) by applying to the theoretical models the reddening function 
as defined in equation |21 

Ft{\) = rr? i (\)xR(c i ,\) (2) 

We know that the observed spectrum if is the product of S and the unknown 
relative contributions A, 

o?4xS (3) 
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A = S* x o> (4) 

then by computing S*, the pseudo-inverse of S, we can determine the relative 
contribution of ages, as equation ^ shows. The following section introduces the 
optimization procedure used in this work. 

4. The Optimization Procedure 

We are interested in the problem of finding the parameters of a known analytic 
function that best match an observation. Let o be the observed galactic spec- 
trum variable, let /("r > ) be a function with the same dimensionality as o. The 
goal of the optimization procedure is to obtain the value of /(T*) that minimizes 
the error e = \o — f(T > )\. In order to solve the problem more efficiently, we pose 
it as a learning problem, where a learning algorithm learns the reddening pa- 
rameters ~f*, and with a forward model we compute /(~f*). The training set 
used by the algorithm, t «), is formed by randomly generated reddening 

parameters, 3%, and their corresponding galactic spectra, ti , where contribu- 
tions of ages were also generated randomly; its test set consists of the galactic 
spectra to be analyzed denoted here by o i, lf n and it outputs an estimate 
of rt, that is expected to minimize the errors e\, e n . When a new set of 
solutions r±, is proposed by the algorithm, we compute their corresponding 
/(fi), /(rvt), using equations 12131 and HI and use the new pairs (f(rt), ~o*i) to 
augment the training set, and continue this iterative process until convergence 
is attained. Since this type of active learning adds to the training set examples 
that are progressively closer to the points of interest, the errors are guaranteed to 
decrease in every iteration. The pseudocode of the algorithm is the following: 

1. Generate randomly an initial set of vectors afj, ...~a? m and compute their corre- 
sponding /( f(~£ m ). 

2. Let P = (/(~afi), t i), (/(~a? m ), t m ) be the initial training set. 

3. Let T — if i, ~~& n be the test set. 

4. While T is not empty 

1. Train an approximator A using P as training set 

2. For each ifj in T 

• Use A to predict ■ 

• Generate ~d\ 

. P = PU(/(ff),^) 

• If \oi — 15*3 1 < threshold remove Oi from T 

In this problem the approximator mentioned in step 4.1 is Locally Weighted 
Linear Regression, an instance-based learning algorithm that has shown good 
results in similar optimization problems (Fuentes & Solorio 2003). 

5. Experimental Results 
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T2 


T3 


maexlO 6 


0.0149 


0.0482 


0.4182 



Table 1. Mean absolute errors in reddening parameters 

In order to evaluate our proposed solution we experimented generating ran- 
domly 500 spectra together with metallicities and intrinsic reddening, we then 
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Lambda (Angstroms) *io* 

A=[0.1132, 0.1763, 0, 0, 0, 0, 0, 0, 0.7105] ,R= [-0.0004, -0.0011, -0.0008] 
A'=[0.1132, 0.1763, 0, 0, 0, 0, 0, 0, 0.7105],R'=[-0.0005, -0.0011, -0.0008] 

Figure 1. In this figure we show test and predicted spectra shifted by a 
constant amount to aid visualization. Vectors A and R are the parameters 
for the test spectrum, while A' and R' are the corresponding predicted pa- 
rameters. 
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A 4 
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A 6 


A 7 


As 


A 9 


macxlO 6 


4.58 


2.92 


1.79 


2.78 


6.48 


2.83 


5.79 


4.33 


1.90 



Table 2. Mean absolute errors in predicted population fractions 



generated their corresponding spectra. From this set we selected randomly 150 
spectra that were used as the test set, the remaining spectra were used as the 
training set. We repeated this process 10 times, and reported the overall av- 
erage. Table |2] presents mean absolute errors in estimating age distributions, 
in Table ^ we show the errors in the reddening parameters. Figure ^ shows a 
comparison between a test example and the predicted one. On average, it takes 
15 seconds to predict the parameters of a single spectrum. 



6. Conclusions 

We presented in this work an optimization algorithm that can estimate with high 
accuracy age distributions and reddening of stellar population in galaxies. The 
algorithm achieves convergence by iteratively creating new data points that lie in 
the vicinity of the query point. One important feature of this method is its high 
speed, it takes 15 seconds to estimate the parameters of a single spectrum. This 
represents a great advantage over other more conventional methods proposed 
for this problem, which may take several hours to find the solution for a single 
spectrum. 
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